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Abstract. — ^A  phylogeny  of  the  mosquito  subfamily  Anophelinae  was  inferred  from  fragments  of 
two  protein-coding  nuclear  genes,  Ggpd  (462  bp)  and  white  (801  bp),  and  from  a  combined  data  set 
(2,1^  bp)  that  included  a  portion  of  the  mitochondrial  gene  ND5  and  the  D2  region  of  the  ribosomal 
28S  gene.  Sixteen  species  from  all  three  anopheUne  genera  and  six  Anopheles  subgenera  were  sam¬ 
pled,  along  with  six  species  of  other  mosquitoes  used  as  an  outgroup.  Each  of  four  genes  analyzed 
individually  recovered  the  same  well-supported  clades;  topological  incongruence  was  limited  to  im- 
supported  or  p>oorly  supported  nodes.  As  assessed  by  the  incongruence  length  difference  test,  most 
of  the  conflicting  signal  was  contributed  by  third  codon  positions.  Strong  structural  constraints,  as 
observed  in  white  and  Gspd,  apparently  had  little  impact  on  phylogenetic  iiderence.  Compared  with 
the  other  genes,  white  provided  a  superior  soiuce  of  phylogenetic  information.  However,  white  appears 
to  have  experienced  accelerated  rates  of  evolution  in  few  lineages,  the  affinities  of  which  are  there¬ 
fore  suspect.  In  combined  analyses,  most  of  the  inferred  relationships  were  well-supported  and  in 
agreement  with  previous  studies:  monophyly  of  Anophelinae,  basal  position  of  Chagasia,  monophyly 
of  Anopheles  subgenera,  and  subgenera  Nyssorhynchus  -(-  Kerteszia  as  sister  taxa.  The  results  suggested 
also  monophyletic  origin  of  subgenera  Cellia  -(-  Anopheles,  and  the  white  gene  analysis  supported  genus 
Bironella  as  a  sister  taxon  to  Anopheles.  The  present  data  and  other  available  evidence  suggest  a  South 
American  origin  of  Anophelinae,  probably  in  the  Mesozoic;  a  rapid  diversification  of  Bironella  and 
basal  subgeneric  Uneages  of  Anopheles,  potentially  associated  with  the  breakup  of  Gond  wanaland;  and 
a  relatively  recent  and  rapid  dispersion  of  subgenus  Anopheles.  [Anopheles;  biogeography;  evolution; 
Gspd;  mosquitoes;  phylogeny;  simultaneous  analysis;  white.] 


AnopheUne  mosquitoes  (CuUddae, 
AnopheUnae)  are  of  prime  medical  impor¬ 
tance  as  human  malaria  vectors,  yet  Iheir 
phylogeny  is  poorly  known.  Traditionally, 
the  subfamily  is  subdivided  into  three 
genera:  Anopheles,  Bironella,  and  Chagasia. 
Chagasia,  a  Neotropical  genus,  is  regarded 
as  sister  to  the  o^er  genera  (Ross,  1951; 
Harbach  and  Kitching,  1998).  Anopheles, 
with  97%  of  all  anopheUne  species,  is  the 
most  diversified  genus,  with  437  species 
classified  into  six  subgenera;  the  cos- 
mopoUtan  Anopheles,  Old  World  Cellia,  and 
the  Neotropical  Kerteszia,  Nyssorhynchus, 
Lophopodomyia,  and  Stethomyia.  Previous 
studies  of  relationships  within  AnopheUnae 
have  been  taxon-limited,  but  some  of  them 
have  hinted  that  the  existing  classification 
does  not  reflect  natural  groups  (Cohn, 
pers.  comm.;  Foley  et  al.,  1998).  Recently,  a 
comprehensive  morphology-based  analysis 
of  Anophelinae  phylogeny  was  conducted 
by  Sallum  et  al.  (2(300),  who  hypothesized 
that  the  subgenus  Anopheles,  as  traditionally 
defined,  is  paraphyletic.  Accordingly,  they 
proposed  a  change  of  the  existing  status  of 


genus  Bironella  and  subgenera  Stethomyia 
and  Lophopodomyia  into  informal  groups 
within  the  subgenus  Anopheles. 

In  contrast  to  their  morphology-based  hy¬ 
pothesis  regarding  the  status  of  subgenus 
Anopheles,  molecular  evidence  tends  to  sup¬ 
port  traditional  systematics.  The  analyses  of 
nuclear  white  (Besansky  and  Fahey,  1997) 
and  mitochondrial  COII  (Foley  et  al.,  1998) 
genes  placed  Bironella  as  sister  to  Anopheles 
lineages.  Because  those  results  might  have 
been  biased  as  a  result  of  limited  taxon  sam¬ 
pling,  Krz)rwinski  et  al.  (2001)  addressed 
the  issue  of  AnopheUnae  phylogeny  by  us¬ 
ing  the  mitochondrial  ND5  gene  and  the 
D2  region  of  ribosomal  28S  nuclear  gene  se¬ 
quences  from  an  expanded  sample  of  taxa. 
Although  no  decisive  support  for  Bironella  as 
a  sister  taxon  to  Anopheles  was  shown,  their 
data  also  favored  traditional  relationships 
and  rejected  the  hypothesis  of  close  affinity 
between  Bironella  and  the  subgenus  Anophe¬ 
les.  They  hypothesized  that  lack  of  resolu¬ 
tion  of  those  relationships  reflected  the  rapid 
radiation  of  the  Bironella  and  Anopheles  sub¬ 
generic  Uneages.  However,  poorly  supported 
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relationships  might  have  resulted  from  bias 
introduced  through  mutational  saturation 
and  unequal  evolutionary  rates  among  lin¬ 
eages  observed  in  those  genes. 

A  better  picture  of  anopheline  evolution 
not  only  would  allow  us  to  imderstand 
the  relationships  within  the  group,  but  also 
might  help  answer  more  fundamental  bio¬ 
logical  questions  concerning  causes  of  the 
peculiar  geographic  distribution  of  Anophe- 
linae.  However,  this  area  of  research  has 
been  virtually  neglected.  To  approach  this 
goal,  additional  phylogenetic  information 
is  needed,  preferably  from  an  alternative 
source.  Promising  molecular  candidates  are 
protein-coding  nuclear  genes,  which  are  not 
as  strongly  biased  in  nucleotide  composi¬ 
tion  as  mitochondrial  genes,  and  are  rela¬ 
tively  easy  to  align — in  contrast  to  rDNA 
(Brower  and  DeSalle,  1994).  Here  we  analyze 
sequences  of  two  protein-coding  single¬ 
copy  nuclear  genes,  glucose-6-phosphate  de¬ 
hydrogenase  (G^pd)  and  white,  from  anophe- 
Ihres.  Glucose-6-phosphate  dehydrogenase 
{Gepd)  plays  a  key  role  in  regulating  carbon 
flow  through  the  pentose  shimt  pathway.  The 
enzyme  is  considered  to  have  an  important 
housekeeping  function  and  for  this  reason 
is  expected  to  be  relatively  conservative  in 
terms  of  amino  acid  changes.  Soto- Adames 
et  al.  (1994)  showed  that  Gepd  was  informa¬ 
tive  for  insect  systematics  over  a  very  broad 
range,  from  sibling  species  to  the  ordinal 
level.  The  protein  product  of  the  white  gene 
belongs  to  a  superfamily  of  Traffic  ATPase 
membrane  transporters  and  helps  transport 
eye  pigment  precursors,  guanine  and  tryp¬ 
tophan,  into  pigment  cells  (Ewart  et  al., 
1994).  The  gene  was  useful  for  reconstruct¬ 
ing  higher-level  relationships  in  mosquitoes 
(Besansky  and  Fahey,  1997). 

To  determine  the  phylogenetic  relation¬ 
ships  within  Anophelinae  and  to  test  the  hy¬ 
pothesis  of  rapid  radiation  of  the  group,  we 
have  performed  maximum  parsimony  and 
maximum  likelihood  analyses  of  the  G^pd 
and  white  gene  fragments.  Further,  we  ex¬ 
plore  the  influence  of  unequal  evolution¬ 
ary  rates  and  structural  constraints,  two  at¬ 
tributes  of  the  sequence  data  detected  in 
white  and  G^pd,  on  the  inference  of  Anophe¬ 
linae  phylogeny.  Because  three  of  the  four 
loci  available  for  a  simultaneous  analysis  ap¬ 
pear  incongruent  with  each  other,  we  have  at¬ 
tempted  to  localize  the  source  of  conflict  and 
address  the  issue  of  treatment  of  multiple 


data  sets  containing  conflicting  information. 
We  use  the  inferred  trees  to  evaluate  the  phy¬ 
logenetic  hypothesis  of  Sallum  et  al.  (2000). 
In  addition,  we  propose  a  hypothesis  for  the 
evolutionary  history  of  Anophelinae  in  a  bio¬ 
geographic  framework. 

Materials  and  Methods 

The  present  data  set  contains  16  species 
of  Anophelinae  representing  all  anophe¬ 
line  genera  and  Anopheles  subgenera  and 
6  species  of  other  mosquitoes  used  as  an  out¬ 
group  (Table  1).  All  species  except  a  represen¬ 
tative  of  the  subgenus  Lophopodomyia  were 
included  in  our  previous  analysis  of  rDNA 
and  mtDNA  genes  (Krzywinsld  et  al.,  2001). 
Discussion  of  outgroup  sampling  is  also  pre¬ 
sented  there. 

Genomic  DNA  was  extracted  following 
Collins  et  al.  (1987)  and  resuspended  in 
100  /il  of  TE  buffer  (10  mMTris,  1  mM 
EDTA),  pH  7.4.  Sequences  of  the  white 
gene  primers  WZ2E,  WZ4E,  and  WZllX 
used  for  pol)rmerase  chain  reaction  (PCR) 
are  given  in  Zwiebel  et  al.  (1995).  G6PDF 
and  G6PDR  are  modified  from  Soto- Adames 
et  al.  (1994),  by  exclusion  of  the  Kpn  I 
linker.  Internal  primers  were  used  in  con¬ 
junction  with  the  flanking  primers  to  fa¬ 
cilitate  amplification  of  the  white  and  G^pd 
gene  from  more  difficult  templates  (Fig.  1, 
Table  2).  PCR  amplification  conditions  for 
the  white  gene  were  as  described  previously 
(Besansky  and  Fahey,  1997).  Gepd  was  am¬ 
plified  in  50  /xl  (total  volume)  with  2.5  mM 
MgCt,  50  mM  KCl,  10  mM  Tris-HCl  (pH  8.3), 
0.001%  gelatin,  200  /mM  each  dNTF  (Gibco- 
BRL),  50  pmol  of  each  primer,  2.5  U  of  Taq 
polymerase  (GibcoBRL),  and  1  fil  of  template 
DNA.  Amplification  was  performed  in  the 
Perkin-Elmer  9600  thermocycler,  with  an  ini¬ 
tial  denaturation  at  94°C  for  3  min,  followed 
by  35  cycles  of  94°C  for  15  s,  50°C  for  15  s, 
and  72°C  for  60  s,  followed  by  the  final  elon¬ 
gation  step  at  72°C  for  10  min.  PCR  products 
were  cloned  directly  into  pGEM-T  vectors 
(Promega).  Cloned  products  were  PCR- 
amplified,  purified  (SfrataPrep  PCR  purifica¬ 
tion  kit,  Stratagene),  and  sequenced  by  using 
ABI  BigDye  terminator  chemistry  (Perkin- 
Elmer  Applied  Biosystems)  on  an  ABI377 
sequencer.  Sequences  of  both  strands  were 
obtained  from  single  clones.  The  sequenc¬ 
ing  error  introduced  by  this  method  (3x1 0~^, 
estimated  by  Kwiatowski  et  al.,  1991) 
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Table  l.  List  of  taxa  examined,  with  geographical  distribution  of  anophelines. 


Subfamily 

Genus 

Subgenus 

Species 

Distribution 

Anophelinae 

Anopheles 

Anopheles 

coustani  Laveran 

Afrotropical/Palearctic 

intermedins  (Peryassu) 

Neotropical 

mattogrossensis  Lutz  and  Neiva 

Neotropical 

quadrimaculatus  Say 

Nearctic 

pseudopunctipennis  Theobald 

Neotropical 

Cellia 

gambiae  Giles 

Afrotropical 

stephensi  Liston 

Oriental 

Kerteszia 

bellator  Dyar  and  Knab 

Neotropical 

cruzii  Dyar  and  Knab 

Neotropical 

neivai  Howard,  Dyar  and  Knab 

Neotropical 

hophopodomyia 

squamifemur  Antunes 

Neotropical 

Nyssorhynchus 

albimanusWiedetmam 

Nearctic/Neotropical 

albitarsis  Lynch  Arribalzaga 

Neotropical 

Stethomyia 

kompi  Edwards 

Neotropical 

Bironella 

Bironella 

grflci/is  Theobald  . 

Australasian 

Chagasia 

bathana  (Dyar) 

Neotropical 

Culicinae 

Aedeomyia 

squamipennis  (Lynch  Arribalzaga) 

Armigeres 

subalbatus  (Coquillett) 

Orthopodomyia 

alba  Baker 

Toxorhynchites 

amboinensis  (Doleschall) 

rutilus  (Coquillett) 

Uranotaenia 

sapphirina  (Osten  Sacken) 

should  have  nb  influence  on  the  results 
of  phylogenetic  analyses  for  this  level 
of  divergence.  Sequences  have  been  de¬ 
posited  in  GenBank,  with  accession  num¬ 
bers  AF317805-AF317824  for  Cepd  and 
AF318192-AF318209  for  white.  The  white 
gene  sequences  of  An.  albimanus,  An.  gambiae, 
Bi.  gracilis,  and  Toxorhynchites  rutilus  were 
obtained  from  GenBank  (accession  num¬ 
bers  U73839,  U29486,  U73829,  and  U73836, 
respectively). 

me  ICACltLll^  L/V^Lll  C/X  vvexa 

confirmed  by  comparison  of  the  conceptual 
translation  obtained  with  TRANSLATE  (Ge¬ 
netics  Computer  Group  [GCG],  1997)  to  the 
sequences  published  by  Besansky  and  Fahey 
(199^)  and  Soto-Adames  et  al.  (1994).  Before 


white 

WZ2E,, 


Specific 
WZ4E  .  < 


G6pd 

G6POF  GCPpintF  G6P0ln|R  G6PDR 

I  ]  Exon  II  >-□  Exon  III  I  ■  -H  Exon  IV  |  | 

,,  Figure  l .  Structure  of  the  white  and  G^pd  genes  and 
the  strategy  for  their  amplification.  The  amplified  frag¬ 
ments  of  exons  are  represented  by  open  boxes;  nonam- 
plified  regions  are  hatched.  Lines  connecting  boxes  are 
introns;  horizontal  arrows  are  primers.  The  location  of 
an  additional  intron  in  the  white  gene  of  nonanophelines 
is  indicated  by  a. vertical  arrow. 


amino  add  alignment  with  PILEUP  (GCG), 
sequences  corresponding  to  introns  were 
identified  ‘and  removed.  After  visual  in¬ 
spection,  slight  manual  adjustments  were 
performed  in  the  white  gene  alignment. 
Nucleotide  sequences  of  both  genes  were 
aligned  according  to  the  resulting  amino  acid 
alignments. 

Phylogenetic  analyses  based  on  maximum 
parsimony  (MP)  and  maximum  likelihood 
(ML)  were  carried  out  with  PAUP*4.d65 

IQQQ^  iiQino-  !Qp;^rrViPS 

V  •  •  ^  W  -  — ,  .,..0  I  f  -  - - - 

and  TBR  branch-swapping.  MP  analyses 
were  done  by  stepwise  random  addition  of 
taxa  with  1,000  replications;  confidence  in 
the  inferred  topologies  was  estimated  by 
bootstrapping '(500  bootstrap  pseudorepli- 
cates,  each  with  10  random  additions  of  se¬ 
quences).  Apart  from  equal  weighting,  three 
other  weighting  schemes  were  applied  to 
explore  the  influents  of  potential  miiltiple 
substitutions  on  recovery  of  basal  anophe- 
line  relationships:  third  position  transitions 
given  zero  weight  (nt3Ti  =  0),  third  posi¬ 
tions  given  zero  weight  (nt3  =  0),  and  amino 
acid  sequences.  The  models  of  DNA  substi¬ 
tution  used  in  the  ML  analyses  that  best  fit 
each  of  the  data  sets  were  determined  by 
a  likelihood  ratio  test,  using  MODELTEST 
2.0  (Posada  and  Crandall,  1998).  Probabilities 
of  substitution  classes  and  the  P  shape  pa¬ 
rameter  (a)  used  in  the  subsequent  analyses 
were  estimated  iteratively  from  the  data  by 


2001 

KRZYWINSKI  ET  AL.— ANOPHEUNAE  PHYLOGENY  543 

Table  2. 

Primer  sequences  used  to  generate  the  white  and  Gtpd  fragments. 

Primer  name 

Sequence  (5'  to  3'Y 

WZ5X 

WZ7X 

GSPDintF 

G6PDintR 

XCC(AG)Tr(AGT)AT(AG)TTCATIAaCC 

XTC(AG)AAIAC(AG)Tr(TC)TC(AG)AAIGTCAT(AG)TnGT 

GAA(AG)AAGT(AT)(CT)GA(ACT)GAGTnTGG 

'rrCTC(AC)AC(AG)AT(AGT)AT{AC)C(GT)(AG)TrCCA 

*X  is  an  Xba  I  linker  (5' — CGCTCTAGA — ^3');  degeneracy  is  indicated  by  parentheses;  I  is  inosine. 


using  the  "tree  scores"  PAUP*  option.  Trees 
obtained  from  the  unweighted  MP  analysis 
were  used  for  an  initial  estimation  of  the  pa¬ 
rameters.  The  parameters  were  fixed  in  a  ML 
heuristic  sear^  and  the  resulting  tree  was 
used  to  reoptimize  the  parameter  values.  Pa¬ 
rameter  estimation  and  tree  searching  were 
continued  until  both  parameters  and  tree 
likelihood  stabilized.  For  maximum  likeli¬ 
hood  tree  searches  and  bootstrap  analyses  we 
used  100  replications.  Potential  effects  of  base 
frequency  differences  among  taxa  on  phy¬ 
logenetic  reconstruction  were  explored  by 
implementing  LogDet/paralinear  transfor¬ 
mation  (Lockhart  et  al.,  1 994)  to  calculate  evo¬ 
lutionary  distances  and  construct  minimum- 
evolution  trees. 

Interior-branch  tests  (Rzhetsky  and  Nei, 
1992)  and  relative  rate  tests  (Takezaki  et  al., 
1995)  implemented  in  PHYLTEST  2.0  written 
by  S.  Kumar  (with  Kimura  2-parameter  4-  P 
correction  for  multiple  substitutions)  were 
used  to  test  the  hypotheses  of  star  phylogeny 
and  rate  constancy  among  lineages,  respec¬ 
tively.  The  hypothesis  of  long-branch  attrac¬ 
tion  was  test^  by  Monte  Carlo  simulations 
according  to  Huelsenbeck  (1997),  using  the 
Siminator  program.  Parametric  bootstrap¬ 
ping  was  performed  to  assess  whether  the 
presence  of  a  given  clade  in  a  tree  might  re¬ 
sult  from  cumulative  phylogenetic  error  in 
component  branches  rather  than  from  signif¬ 
icant  phylogenetic  signal  (Huelsenbeck  and 
Rannala,  1997). 

The  simultaneous  analysis  involved  se¬ 
quences  of  white  and  G^pd  combined  with 
the  sequences  of  the  mitochondrial  gene 
ND5  and  expansion  segment  D2  of  the  nu¬ 
clear  ribosomal  28S  gene  (Krzywinski  et  al., 
in  press;  also  see  Table  3,  present  paper). 
Combining  the  morphological  data  set  of 
S^um  et  al.  (2000)  with  our  molecular  data 
was  not  possible  because  only  four  species 
were  common  to  both  studies.  Before  the 
combined  analysis,  the  presence  of  conflict 
between  the  data  sets  was  evaluated  by 
using  the  incongruence  length  difference 


(ILD)  test  (Farris  et  al.,  1995),  implemen¬ 
ted  in  PAUP*  as  described  by  Cunningham 
(1997). 

Results 

Gene  Sequences  and  Alignment 

The  white  gene  in  An.  gambiae  contains  five 
exons  (Besansky  et  al.,  1995).  The  segment 
used  for  this  study,  encompassing  most  of 
exon  IV,  intron  4,  and  the  5'  half  of  exon  V,  was 
amplified  and  sequenced  from  15  mosquito 
species.  Attempts  to  amplify  the  whole  frag¬ 
ment  from  three  outgroup  species,  Aedeomyia 
squamipennis,  Armigeres  subalhatus,  and  Ura- 
notaenia  sapphirina,  were  unsuccessful,  yield¬ 
ing  in  these  cases  only  partial  sequences  from 
the  5'-end  of  the  fragment.  The  gene  struc¬ 
ture  described  above  was  observed  in  all 
anophelines  except  An.  albitarsis,  for  which 
the  sequence  was  intronless.  Characteristic 
of  nonanophelines  was  an  additional  intron 
within  exon  V  (Fig.  1),  as  reported  earlier  by 
Besansky  and  Fahey  (1997).  The  nucleotide 
alignment  of  the  white  gene  with  intron  se¬ 
quences  removed  was  801  characters  long. 
The  coding  sequences  of  the  complete  frag¬ 
ment  varied  in  length,  ranging  from  726  bp 
in  Ch.  bathana  and  Orthopodomyia  alba  to 
780  bp  in  An.  albitarsis,  because  of  a  highly 
variable  region  spanning  codons  40-83.  Ex¬ 
cept  for  Nyssorhynchus  species,  the  sequence 
length  was  conserved  within  subgenera  of 
Anopheles. 

The  Gipd  gene  structure,  as  determined  in 
Drosophila  melanogaster,  has  fotur  exons  (Fouts 
et  al.,  1988).  The  fragment  analyzed  in  this 
study,  including  nearly  half  of  exon  n,  intron 
2,  exon  lH,  intron  3,  and  the  5'-end  of  exon  IV 
(Fig.  1),  was  obtcuned  from  16  species.  Most 
of  this  fragment  (73%  of  the  5'-end)  was  also 
obtained  from  An.  intermedius.  An.  kompi,  An. 
pseudopunctipennis,  and  Ch.  bathana.  The  gene 
could  not  be  amplified  from  Tx.  rutilus  and 
An.  squamifemur,  possibly  because  of  DNA 
degradation,  long  introns,  nusmatches  be¬ 
tween  primers  and  target  sequences,  or  some 
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Table  3.  Mean  nucleotide  composition  and  numbers  of 
Gepd,  NDS”,  and  DT  genes. 

aligned,  variable,  and  informative  sites  for  the  white. 

Overall 

ntl 

All  sites 

nt2 

Hydrophobic^ 

Hydrophilic‘S 

r\t3 

white 

A 

20.4 

27.0 

20.9 

12.3 

25.0 

13.4 

C 

27.4 

21.1 

25.6 

31.3 

22.9 

35.4 

G 

26.3 

29.9 

15.0 

'  4.2 

20.4 

34.1 

T 

25.8 

22.0 

38.4 

52.2 

31.7 

17.1 

Variable** 

402  (50.1) 

93  (34.8) 

64  (24.0) 

13  (15.5) 

51  (27.9) 

245  (91.8) 

Informative 

336 

65 

40 

5 

35 

231 

Aligned 

801 

267 

267 

84 

183 

267 

Gspd 

A 

23.6 

23.8 

35.3 

16.3 

46.4 

11.8 

C 

23.7 

23.7 

16.3 

13.5 

18.0 

30.9 

G 

29.5 

30.6 

19.4 

20.8 

18.6 

38.6 

T 

23.2 

21.8 

29.0 

49.4 

29.0 

18.7 

Variable** 

247  (53.5) 

60  (39;0) 

43  (27.9) 

ia(21.7) 

30  (31.9) 

144  (93.5) 

Informative 

197 

40 

25 

5 

20 

132 

Aligned 

462 

154 

154 

^0 

94 

154 

ND5 

Variable** 

Informative 

Aligned 

309  (58.9) 

233 

525 

98  (56.0) 

73 

175 

66  (37.7) 
47 

175 

147  (84.0) 
115 

175 

or 

Variable** 

Informative 

Aligned 

244  (70.1) 

180 

348 

*From  Krzywinski  et  al.  (2001). 

^Refers  to  transmembrane  or  buried  regior^  in  white  and  Gg  pd,  respectively. 
‘^Refers  to  external  or  exposed  regions  in  white  and  pd,  respectively. 

‘^In  parentheses  is  given  percent  of  aligned  characters. 

®For  D2,  only  numbers  of  characters  in  regions  included  in  the  analysis  are  given. 


combination  of  these.  Introns  2  and  3  were 
found  in  all  Anophelinae,  whereas  in  the 
other  mosquitoes  intron  3  was  absent.  The 
coding  sequences  were  equal  in  length,  ex¬ 
cept  for  Or.  alba,  in  which  the  sequence  was 
one  codon  longer  than  in  the  other  species. 
The  resulting  G^pd  nucleotide  alignment  was 
462  characters  long. 

Nucleotide  Composition  and  Sequence 
Divergence 

Within  genes,  overall  nucleotide  frequen¬ 
cies  were  nearly  equal  (Table  3^1.  Strong  bias 
was  found  at  the  third  codon  positions  (nt3), 
where  G  -h  C  accounted  for  70%  of  all  bases. 
Between  genes,  mean  nucleotide  composi¬ 
tion  across  species  was  similar,  except  for  sec¬ 
ond  codon  positions  (nt2),  where  white  was 
rich  in  T  -I-  C  and  A  +  T  predominated  in 
Gepd. 

Analyses  of  nt3  sites  in  each  species  indi¬ 
vidually  also  revealed  strong  differences  in 
base  composition  (Fig.  2).  This  heterogeneity 


was  highly  significant  for  both  genes  as  re¬ 
vealed  by  a  test  of  independence  (P<^C 
0.01).  In  Gepd,  relatively  low  G  +  C  content 
in  two  outgroup  species.  Ad.  squamipennis 
and  Ur.  sapphirina,  accoimted  for  most  of  the 
heterogeneity,  as  we  found  by  nmning  the 
test  after  sequential  exclusion  of  taxa  with 
extreme  base  composition.  Only  exclusion 
of  both  species  eliminated  the  heterogeneity 
in  base  composition  (P  =  0.10).  In  white  the 
strongest  differences  were  observed  within 
the.genus  Anopheles:  Members  of  Kerteszia,  as 
well  as  An.  intermedius  and  An.  mattogrossen- 
sis  (belonging  to  the  Arribalzagia  Series  of 
the  subgehtis  Anopheles),  showed  no  or  slight 
G  -f  C  bias,  whereas  G  4-  C  content  in  An. 
gambiae  exceeded  90%.  In  contrast,  the  two 
Arribalzagia  species  were  more  biased  for 
G  -)-  C  ai  G 6 pd  nt3  sites.  The  homogene¬ 
ity  of  base  frequencies  in  G^pd  was  not  re¬ 
jected  by  the  x^  test  when  all  codon  posi¬ 
tions  combined  were  analyzed.  In  contrast, 
this  homogeneity  was  rejected  for  the  white 
gene  (P«C0.01).  This  result  reflects  strong 
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An.  (Ker.)  neivai 
An.  (Ker.)  beilaior 
An.  (Kw.)  CTxaa 
An.  (Nys.)  aibimanus 
An.  (Nys.)  atoharsis 
An.  (Ano.)  mattogrossensis 
An.  (Ano.)  intemwAjs 
An.  (Ano.)  <^ia(trimacijiatus 
An.  (Ano.)  coust^ 

An.  (Ano.)  pseudopunctipenns 
An.  (Cef.)  gambiae 
An,  (Cel.)  stephensi 
An.  (Lph.)  squamitemur 
An.  (Ste.)  kompi 
Bi.  grac^ 

Ch.  bathana 
Tx.  ruf  A/s 
Tx  amboinensis 
Ad.  stfijampennis 
Or.  alba 
Ar.  subaibafus 
Vr.  sapphkina 


100  80  60  40  20  0 

%  G3  +  C3 


0  20  40  60  BO  100 

%G3  +  C3 


Figure  2.  G  +  C  content  at  third  codon  positions  in  the  white  and  genes  in  the  mosquito  species  studied. 
Intron  G  +  C  percentage  (hatched  areas)  is  mapped  onto  exon  G  +  C  content  to  show  the  correlation  between  base 
composition  of  coding  and  noncoding  regions  of  the  genes.  Dashed  lines  represent  mean  G  +  C  content  across  all 
species.  Note  that  An.  albitarsis  lacks  introns  in  the  while  gene.  G^pd  sequences  from  An.  squamifemur  and  Tx.  rutilus 
were  not  available. 


influence  of  the  nt3  sites  on  overall  base  fre¬ 
quencies  in  white,  because  ntl  and  nt2  sites 
are  quite  homogeneous  (P  =  0.999  in  both 
cases). 

Ranges  of  sequence  divergences  at  increas¬ 
ing  taxonomic  levels  for  Gepd  and  white  gene 
fragments  are  presented  in  Table  4. 


Table  4.  Ranges  of  uncorrected  pairwise  sequence 
divergences  at  increasingly  inclusive  taxonomic  levels 
for  Gfipd  and  white  gene  fragments.  Note  that  two  sub¬ 
genera  of  the  genus  Anopheles  are  included. 


Ctpd 

white 

Min 

Max 

Min 

Max 

sg.  Kerteszia 

0.051 

0.082 

0.071 

0.109 

sg.  Anopheles 

0.101 

0.165 

0.137 

0.238 

Anopheles 

0.051 

0.247 

0.071 

0.298 

Anophelinae 

0.051 

0.247 

0.071 

0.317 

Outgroup 

0.263 

0.364 

0.123 

0.271 

Ingroup-outgroup 

0.218 

0.323 

0.178 

0.336 

Phylogenetic  Analysis 

Gfpd. — ^The  MP  analyses  of  Gepd  data  im- 
der  different  weighting  schemes  resulted  in 
trees  showing  little  agreement  in  relation¬ 
ships  within  Anophelinae.  Of  the  deeper 
nodes,  only  the  basal  position  of  Chaga- 
sia  was  recovered  in  aU  trees,  whereas  ti\e 
position  of  other  clades  depended  on  the 
weighting  applied.  The  position  of  Bironella 
as  a  sister  group  to  Anopheles  was  recov¬ 
ered  in  only  one  of  the  trees  derived  un¬ 
der  the  ntSCIi)  =  0  weighting  (and  also  in 
ML  tree).  In  all  other  trees,  Bironella  was  as¬ 
sociated  either  with  subgenus  Cellia  or  as 
a  sister  taxon  to  a  clade  consisting  of  Cel¬ 
lia  and  a  subset  of  species  from  the  sub¬ 
genus  Anopheles.  None  of  the  searches  recov¬ 
ered  monophyly  of  the  subgenus  Anopheles. 
Subgenera  Cellia  and  Kerteszia  were  found 
in  interchanging  positiorrs  either  among 
basal  or  most-derived  clades.  Not  surprising. 
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the  bootstrap  majority-rule  consensus  trees 
were  very  poorly  resolved.  Of  the  relation¬ 
ships  inferred  by  using  ML,  only  the  clades 
well-supported  in  MP  received  ML  bootstrap 
proportions  >50%  (Fig.  3). 

white. — Phylogenetic  analyses  of  the 
white  gene  based  on  both  MP  and  ML 
consistently  recovered  deep  relationships 
within  the  Anophelinae,  with  Chagasia  as 
a  basal  lineage  and  Bironella  as  a  sister 
group  of  Anopheles  (Fig.  4).  All  but  one 
analysis  indicated  monophyly  of  Anopheles 
subgenera;  only  equally  weighted  parsi¬ 
mony  inferred  that  subgenus  Anopheles  was 
paraphyletic,  showing  a  clustering  of  An. 
pseudopunctipennis  with  Kerteszia  rather  than 
with  the  remaining  subgenus  Anopheles 
species.  In  most  trees  An.  kompi  (subgenus 
Stethomyia)  assumed  a  basal  position  relative 
to  all  other  species  of  the  genus  Anopheles. 
Within  Anopheles,  two  major  monophyletic 
lineages  were  suggested  but  were  not 


An.  (Ano.)  qua<)rimacvlatus 
(Ano.)  cousiani 
sntermedius 
^  An.{Ano.)matlogm$sensis 
—  An.  (Ano.)  pseudopunctipennts 
Bi.  gracHis 
An.  (Cot.)  stophensi 
An.  (  Cot.)  gambiao 
An.  (Slo.)  tmrnpi 

52/61/- 

74^  An.  (Kof.)  ava 

\  nohtai 

An.  (Kof.)  boMator 
An.  (Nys.)  aSMiarsis 
An.  (Nys.)  aSttmanus 
Ch.  bathana 

Ad.  squamiponnis 

Tx.  amboinensis 
Or.  aJba 


■  Ar.  subalbatus 
Ur.  sapphirina 


0. 1  substHuthns/sito 


Figure  3.  Phylogenetic  relationships  within  mos¬ 
quitoes  inferred  from  the  G(pd  gene  sequences  by  max¬ 
imum  likelihood  using  a  submodel  (SYM;  Zharkikh, 
1994)  within  the  general  time  reversible  (GTR)  model, 
assuming  equal  base  frequencies,  six  substitution  rates, 
and  adjustments  for  among-site  rate  heterogeneity 
(SYM  I  -(-  r).  The  best-fit  tree  (—In  L  =  4265.82)  was 
obtained  using  transformation  probabilities  (A  <->  C  = 
1.935,  A*^G  =  5.237,  A  T  =  2.136,  C  -o.  G  =  3.580, 
C  T  =  13.680,  and  G  ■<->  T  =  1)  and  gamma  shape  pa¬ 
rameter  (or  =  1.2053)  estimated  with  PAUP*  and  an  ob¬ 
served  proportion  of  invariable  sites  (I  =  0.4654).  Num¬ 
bers  at  nodes  represent  bootstrap  support  >50%  for 
equally/ ntOTi  =  0/nt3  =  0  weighted  parsimony  above 
the  line,  and  maximum  likelihood  analyses  below  the 
line. 
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Figure  4.  Phylogenetic  relationships  within  mos¬ 
quitoes  based  on  the  white  gene  sequences.  The  ML  tree 
(—In  L  =  7738.48)  was  inferred  by  using  the  Kimura 
(1981)  three-parameter  model  with  unequal  (observed) 
base  frequencies  taken  into  account  (K3Puf)  and  site- 
specific  rate  differences  accommodated  by  assunung  an 
observed  proportion  of  sites  to  be  invariable  (I  =  0.4956) 
and  the  remaining  sites  assumed  to  follow  a  discrete 
approximation  of  the  gamma  distribution  {a  =  1.3644 
estimated  with  PAUP*).  The  following  rates  of  the  sub¬ 
stitution  were  estimated  by  using  PAUP*:  A  <->  C  and 
G  <->.  T  =  1,  A  GandC  T  =  5.416,  A  TandC 
G  =  1 .704.  The  table  shows  the  bootstrap  support  for  the 
nodes  marked  with  letters. 


supported  in  MP  trees:  {Nyssorhynchus  -f 
Kerteszia)  and  (Cellia  -f-  subgenus  Anopheles). 
Lophopodomyia  was  placed  in  various  loca¬ 
tions,  depending  on  the  inference  method: 
as  a  clade  branching  early  from  an  Anophe¬ 
les  stem  (after  Stethomyia),  grouped  with 
Stethomyia  forming  a  basal  lineage  within 
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Anopheles,  or  associated  with  (Nys- 
sorhynchus  +  Kerteszia). 

Despite  strong  variability  in  nt3  base  com¬ 
position  among  taxa,  the  tree  constructed 
with  the  LogDet/paralinear  method  was 
similar  to  those  inferred  with  MP  or  ML  (data 
not  shown).  The  only  difference  was  para- 
phyly  of  Cellia  (not  supported  by  bootstrap), 
which  was  always  derived  as  monophyletic 
by  other  inference  methods  and  with  other 
data  sets. 

Combined  analysis. — The  ILD  test  to  assess 
congruence  among  data  sets  was  applied  to 
the  nuclear  genes  used  in  this  study  and  to 
the  ND5  and  D2  genes  from  Krzywinski  et  al 
(2001).  (  For  ND5  and  D2  description  see 
Table  3.)  The  results  suggested  that  all  genes 
except  ND5  were  significantly  incongment 
(Table  5),  despite  the  fact  that  the  tree  topolo¬ 
gies  derived  from  each  gene  separately  were 
congruent  for  the  more  strongly  supported 
relationships.  Because  conducting  both  sep¬ 
arate  and  combined  phylogenetic  analyses 
may  lead  to  better  understanding  of  the 
data  at  hand  (Sullivan,  1996),  we  combined 
all  available  sequences  for  a  simultaneous 
analysis. 

We  conducted  the  analysis  with  and  with¬ 
out  An.  squamifemur,  a  representative  of  the 
small  and  rare  subgenus  Lophopodomyia,  be¬ 
cause  sequences  of  two  genes,  ND5  and  Gepd, 
were  not  available  from  this  taxon  (for  op¬ 
posing  views  on  the  effects  of  incomplete 
data  matrices  in  phylogeny  reconstruction, 
see  Huelsenbeck,  1^1;  Wiens  and  Reeder, 
1995). 

In  the  full  (22-species)  ML  tree.  An.  squam¬ 
ifemur  was  inferred  as  a  sister  taxon  of 
Nyssorhynchus  -t-  Kerteszia  (Fig.  5).  Inclusion 
of  this  species  had  no  effect  on  the  po¬ 
sition  of  other  clades,  although  the  sup¬ 


port  for  Anopheles  minus  Stethomyia  and 
Cellia  +  subgenus  Anopheles  was  substan¬ 
tially  less  than  for  the  21 -species  data  set. 
Parsimony  analyses  of  the  extended  versus 
21-species  data  set  led  to  minor  changes 
in  tree  topology.  Flowever,  apart  from  Nys¬ 
sorhynchus  -I-  Kerteszia,  none  of  the  relation¬ 
ships  among  Bironella  and  subgeneric  clades 
of  Anopheles  were  well-supported. 

Discussion 

Phylogenetic  Utility  of  the  Genes 

In  their  study  of  msects,  Soto- Adames  et  al. 
(1994)  suggested  that  the  Gepd  should  be 
useful  in  phylogenetic  reconstruction  from 
generic  to  ordinal  levels.  Poor  resolution  and 
low  support  for  the  inferred  clades  show  that 
the  Gspd  gene  has  limited  utility  as  a  phylo¬ 
genetic  marker  within  Anophehnae  and  per¬ 
haps  in  mosquitoes  generally.  This  example 
supports  the  observation  of  Mardulyn  and 
Whitfield  (1999)  that  good  performance  of  a 
gene  in  one  taxon  is  not  always  easily  ex¬ 
trapolated  on  the  performance  in  other,  even 
closely  related,  taxa. 

In  contrast  to  Gepd,  the  white  gene  ap¬ 
pears  much  more  informative  in  anophe- 
lines.  However,  the  white  gene  alignment  was 
nearly  twice  as  long  as  the  Gepd  data.  Because 
the  proportions  of  variable  and  informative 
sites  overall  and  for  each  codon  position  were 
nearly  identical  in  both  genes,  one  might 
wonder  whether  the  higher  phylogenetic  in¬ 
formation  content  of  white  is  merely  a  matter 
of  longer  sequences  analyzed.  To  answer  this, 
we  used  a  5'-fragment  of  white  (with  com¬ 
plete  sequences  for  all  taxa)  roughly  equal 
in  length  to  the  Gepd  fragment  (453  aligned 
characters  containing  206  informahve  sites) 
and  reanalyzed  this  fragment  by  a  MP 


Table  5.  Results  of  the  ILD  test.  P-values  are  given  for  every  gene  pair  comparison  and  each  gene  versus  all 
other  genes  combined. 


Gepd  ND5  D2  All  genes 


All  sites 

1+2 

All  sites 

1  +  2 

All  sites 

1  +  2 

White 

All  sites 

0.001 

0.089 

0.177 

0.261 

0.001 

0.003 

1+2 

0.077 

0.041 

0.910 

0.945 

0.001 

0.563 

0.182 

Gepd 

All  sites 

0.557 

0.611 

0.013 

0.001 

1+2 

0.877 

0.579 

0.022 

0.256 

0.040 

ND5 

All  sites 

0.055 

0.511 

1+2 

0.005 

0.753 

0.621 

D2 

0.010 

0.001 
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Figure  5.  Phylogenetic  relationships  within  mosr 
quitoes  based  on  the  extended  combined  data  set  of 
D2,  ND5,  Gepd  and  white  genes.  The  ML  tree  (—In  L 
=  21309.77)  was  inferred  using  a  submodel  (TIV;  Ro¬ 
driguez  et  al.,  1990)  within  the  GTR  model  and  assum¬ 
ing  unequal  (observed)  base  frequencies,  one  transi¬ 
tion  rate  (A  <-»•  G  and  C  <->  T  =  5.406),  four  transversion 
rates  (A  C  =  1.127,  A  T  =  3.888,  C**G  =  2.851, 
G  T  =  1),  and  adjustments  for  among-site  rate  het¬ 
erogeneity  (observed  value  of  I  =  0.310,  a  =  1.03S).  Sub¬ 
stitution  rates  and  gamma  shape  parameter  a  were  es¬ 
timated  by  using  PAUP*.  Numbers  at  nodes  represent 
bootstrap  support  >50%  for  equally/nt3Ti  =  0/nt3  = 
0-weighted  parsimony  above  the  line,  and  maximum 
likelihood  analyses  below  the  line  for  22  species  and  (in 
parentheses),  for  21  species  data  sets.  Thick  lines  rep¬ 
resent  the  branches  with  100%  bootstrap  support  in  all 
four  analyses. 


approach.  The  inferred  trees  were  similar  in 
topology  and  resolution  to  the  trees  inferred 
from  the  total  white  ^ene  fragment,  although 
bootstrap  support  ior  some  clades  dropped 
(e.g.,  62%  vs.  74%  for  Anophelinae).  This  re¬ 
sult  suggests  that  white'  is  a  better  source 
of  phylogenetic  information  m  mosquitoes 
than  Getpd. 

Unequal  Evolutionary  Rates  and  Phytogeny 
Reconstruction 

The  relationship  between  Nyssorhynchus 
and  Kerteszia  inferred  from  different  data  sets 
apparently  represents  an  interesting  test  case 
of  long-branch  attraction.  The  clade  formed 
by  these  subgehera  was  recovered  with  high 
bootstrap  support  regardless  of  the  infer¬ 


ence  method  from  D2,  D2  -(-  ND5,  all  four 
genes  combined  (Krzywinski  et  al.,  2001; 
and  present  data),  and  morphological  char¬ 
acters  (Sallum  et  al.,  2000),  strongly  suggest¬ 
ing  true  phylogenetic  relationships.  Surpris¬ 
ingly,  under  unweighted  MP  of  the  white 
gene,  Kerteszia  was  clustered  with  the  sub¬ 
genus  Anopheles — the  only  instance  when 
a  clade  was  not  supported  by  the  white 
gene  but  was  strongly  supported  by  other 
genes  or  combined  data.  Felsenstein  (1978) 
pointed  out  that  parsimony  converges  to 
an  incorrect  phylogeny  if  the  evolutionary 
rates  along  the  lineages  are  strongly  un¬ 
equal.  Relative  rate  tests  indicate  that  the 
rates  of  the  white  gene  indeed  strongly  de¬ 
part  from  constancy  (Table  6).  Increased  rates 
within  long-branched  lineages  apparently 
led  to  multiple  changes  at  numerous  nt3  sites, 
obscuring  the  true  phylogenetic  signal  for 
Kerteszia's  sister  taxon  relationship  in  un¬ 
weighted  MP.  Elimination  from  the  analy¬ 
sis  of  third  codon  transitions,  elimination  of 
third  codon  positions  altogether,  or  imple¬ 
mentation  of  the  ML  method  that  corrects 
for  multiple  substitutions  results  in  the  re¬ 
covery  of  Nyssorhynchus  -f  Kerteszia  clade.  Al¬ 
though  the  Monte  Carlo  simulations  did  not 
clearly  show  that  branches  are  long  enough 
to  attract,  long-branch  attraction  should  not 
be  excluded  here.  The  HKY85  -1-  P  model 
used  for  data  simulations  may  be  too  sim¬ 
ple  for  the  data  at  hand,  or  the  test  may 
not  perform  well  when  strong  base  composi¬ 
tion  differences  exist  across  taxa  (see  below); 
these  together  make  the  test  very  conserva¬ 
tive  (Huelsenbeck,  pers.  comm.). 

Another  possible  factor  underlying  in¬ 
congruence  between  the  imweighted  MP 
analysis  of  the  white  gene  and  other  anal¬ 
yses  may  be  convergence  in  nucleotide 
content  between  different  Anopheles  lineages. 
Strong  differences  in  nt3  base  composition 
among  species  (Fig.  2)  and  associated  differ¬ 
ences  in  synonymous  codon  usage  are  ob¬ 
served  m  the  white  gene  (see  Besansky  and 
Fahey,  1997).  Noiuandom  usage  of  codons 
is  attributable  to  either  mutational  bias  or 
selection.  Moriyama  and  Powell  (1997)  sug¬ 
gested  that  most  codon  bias  in  Drosophila  re¬ 
sults  from  selection  for  efficient  translation 
related  to  the  isoaccepting  tRNA  availabil¬ 
ity  in  highly  expressed  genes.  The  informa¬ 
tion  concerning  levels  of  white  expression  in 
mosquitoes  is  very  scarce.  However,  the  evi¬ 
dence  available  from  An.  gambiae,  the  species 
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Table  6.  Two-cluster  relative  rate  tests  for  the  white  gene  within  Anopheles.  Gapped  regions  and  those  with 
missing  data  were  excluded  before  making  distance  calculations.  In  all  comparisons  Chagasia  was  used  as  a  reference 
taxon. 


Taxon  A 

Taxon  B 

U-Lb 

V(La-Lb) 

z 

Kerteszia 

Nyssorhynchus 

0.6848 

0.2171 

3.1548* 

sg.  Anopheles 

Cellia 

0.3409 

0.0999 

3.4107* 

sg.  Anopheles 

Kerteszia 

-0.3750 

0.1882 

1.9924* 

sg.  Anopheles 

Lophopodomyia 

0.2586 

0.0965 

2.6789* 

Cellia 

Lophopodomyia 

-0.0823 

0.0660 

1.2466 

•Z-value  significant  at  0.05. 


with  the  greatest  codon  bias,  suggests  that 
this  gene  is  expressed  at  very  low  levels 
(Besansky  et  al.,  1995).  Even  if  we  assume 
that  zvhite  is  actually  expressed  at  very  high 
je\  els,  but  in  few  tissues  and  in  short  enough 
bursts  to  escape  detection,  substantial  differ¬ 
ences  in  codon  usage  among  species  are  dif¬ 
ficult  to  explain.  Thus,  selechon  seems  un¬ 
likely  to  have  played  an  important  role  in 
codon  bias  in  white.  Rather,  a  clear  positive 
correlation  between  G  -f-  C  content  of  introns 
and  exon  nt3  sites  (Pearson  product-moment 
correlation  coefficient  r  =  0.84,  P  <  0.01;  see 
also  Fig.  2)  strongly  suggests  that  mutation 
bias  is  responsible  for  the  observed  patterns 
of  codon  usage.  Interestingly,  no  such  corre¬ 
lation  was  evident  in  the  G^pd  gene  (r  =  0.43, 
P  >  0.05). 

Sequence  Conservation  and  Protein 
Structure 

Base  composition  at  the  nt2  sites  suggests 
that  different  evolutionary  forces  act  on  the 
Gepd  and  white  genes,  reflecting  the  struc¬ 
tural  constraints  imposed  on  their  protein 
products. 

G6PD,  a  cytosolic  globular  protein,  has 
a  highly  conserved  three-dimensional  struc¬ 
ture  of  hydrophilic  external  parts  and  a  hy¬ 
drophobic  core  (Naylor  et  al.,  1996;  Notaro 
et  al.,  2000).  TTie  G6PD  fragment  imder  study 
is  located  close  to  the  NH2  terminus  of  the 
molecule  and  encodes  portions  of  both  exter¬ 
nal  and  core  regions.  When  mosquito  G^pd 
sequences  are  partitioned  into  nucleotide 
triplets  encoding  exposed  or  buried  residues, 
as  predicted  on  the  basis  of  human  G6PD  ter¬ 
tiary  structure  (Notaro  et  al.,  2000),  sharply 
different  patterns  of  nucleotide  composition 
at  nt2  are  revealed  in  both  groups  (Table  3). 
Buried  amino  acids,  more  than  half  of  which 
are  hydrophobic,  are  strongly  biased  toward 
T,  in  accord  with  Naylor  et  al.  (1995).  In 


contrast,  most  of  the  exposed  amino  acids  are 
hydrophilic  in  nature,  with  a  predominance 
of  A  or  G  at  nt2  positions.  Nonsynonymous 
substitutions  are  located  mainly  in  the  hy¬ 
drophilic  external  parts  of  the  protein,  sim¬ 
ilar  to  the  findings  of  Notaro  et  al.  (2000). 
However,  most  changes  are  conservative,  re¬ 
placements  involving  amino  acids  of  similar 
properties. 

The  white  gene  encodes  a  protein  belong¬ 
ing  to  a  superfamily  of  ABC  transporters 
(Higgins,  1^2).  Characteristic  of  these  pro¬ 
teins  are  two  domains;  an  ATP-binding  do¬ 
main  located  at  the  cytoplasmic  face  of  the 
membrane,  and  a  transmembrane  domain 
spanning  the  cellular  membrane.  The  frag¬ 
ment  of  white  chosen  for  this  study  consists 
of  the  carboxy  terminus  of  the  cytoplasmic 
domain  and  most  of  the  transmembrane  do¬ 
main,  which  in  turn  encompasses  five  puta¬ 
tive  membrane-spanning  a-helices  and  four 
intervening  loops  located  outside  the  mem¬ 
brane  (Zwiebel  et  al.,  1995).  To  preserve  the 
conformational  stability  of  the  protein,  the 
membrane-spanning  fragments  are  expected 
to  be  rich  in  hydrophobic  residues.  Indeed, 
in  comparison  with  the  external  (loop)  re¬ 
gions,  base  composition  in  this  gene  partition 
is  strongly  biased  toward  C  -I-  T  in  second  po¬ 
sitions  (Table  3). 

These  examples  suggest  that  the  structural 
constraints  limiting  character-state  space  at 
nt2  may  be  widespread  in  nature.  In  phy¬ 
logenetic  reconstruction,  particularly  in  the 
case  of  more  distantly  related  taxa,  such  con¬ 
straints  are  a  probable  source  of  homoplasy 
in  characters  traditionally  treated  as  most 
reliable  (Naylor  et  al.,  1995).  In  the  present 
study,  these  constraints  are  unlikely  to  have 
contributed  substantial  homoplasy,  given  the 
small  number  of  mformative  sites  at  nt2  in 
hydrophobic  regions  (Table  3);  moreover,  ex¬ 
cluding  external  nt2  sites  from  the  Gspd  gene 
did  not  improve  MP  results.  However,  the 
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relationship  between  protein  structure  and 
sequence  conservation  revealed  by  our  data 
may  have  some  bearing  on  the  improvement 
of  existing  models  of  amino  acid  sequence 
evolution  (Lio  and  Goldman,  1999,  and  ref¬ 
erences  therein). 

The  Weight  of  Evidence:  Combining 
Independent  Data  Sets 

The  problem  of  how  to  analyze  indepen¬ 
dent  data  sets  is  a  subject  of  persistent  con¬ 
troversy.  Some  authors  (Miyamoto  and  Fitch, 
1995)  suggest  that  data  partitions  always 
should  be  considered  separately  in  a  tax¬ 
onomic  congruence  framework  (Mickevich, 
1978).  Others  (e.g.,  Kluge,  1989)  claim  that 
data  always  should  be  combined  in  a  si¬ 
multaneous  analysis  because  this  maximizes 
the  informativeness  of  the  data  and  yields 
a  strong  estimate  of  phytogeny.  Proponents 
of  a  third  alternative  (Bull  et  al.,  1993;  de 
Queiroz,  1993)  suggest  that  the  decision  to 
combine  the  data  should  depend  on  the  de¬ 
gree  of  incongruence  between  separate  par¬ 
titions.  Several  statistical  tests  have  been 
used  to  evaluate  incongruence  among  data 
partitions  (Templeton,  1983;  Kishino  and 
Hasegawa,  1989;  Rodrigo  et  al.,  1993;  Farris 
et  al.,  1995;  Huelsenbeck  and  Bull,  1996). 
However,  the  existing  tests  seem  too  conser¬ 
vative  and  inadequate  to  address  the  issue  of 
when  simultaneous  analysis  should  be  per¬ 
formed  (Sullivan,  1996;  Cunningham,  1997; 
Remsen  and  DeSalle,  1998).  Here  we  applied 
the  ILD  test,  which  performs  better  in  pre¬ 
dicting  the  compatibility  of  combined  data 
than  goodness-of-fit  tests  do  (Cunningham, 
1997)  and  which  is  commonly  used  in  phy¬ 
logenetic  studies  (Caterino  et  al.,  2000).  Ac¬ 
cording  to  this  test,  only  ND5  sequences  can 
be  combined  with  any  other  gene  (Table  5). 
Interestingly,  when  topology  and  bootstrap 
values  were  examined  in  separate  gene  trees, 
topological  incongruence  was  generally  lim¬ 
it^  to  unsupported  or  poorly  supported 
nodes,  whereas  highly  supported  branches 
were  congruent  across  the  trees.  When  P  = 
0.01  was  taken  as  a  significance  threshold 
(Cunningham,  1997),  G^pd  was  congruent 
with  D2,  and  also  with  the  white  gene  at 
ntl-knt2  positions.  Congruence  was  also 
suggested  when  ntl  -f  nt2  positions  from  one 
gene  were  compared  wiA  all  positions  of 
another  protein-coding  gene.  The  discrepant 
results  between  all  positions  and  ntl  -I-  nt2 


positions  cannot  be  completely  accounted  for 
by  a  lack  of  resolution,  and  therefore  the 
perception  of  congruence,  in  the  latter  data 
partition  because  ntl  +  nt2  of  white  pro¬ 
duced  well-resolved  topology  (Fig.  4).  Even 
after  exclusion  of  nt3  sites,  ^e  ILD  test  indi¬ 
cated  that  the  white  gene  and  also  the  ND5 
gene  were  incongruent  with  D2.  Such  an  in¬ 
congruence  may  result  from  extreme  differ¬ 
ence  in  the  evolutionary  rates  along  some 
branches.  Taken  together,  these  results  in¬ 
dicate  that  weak  conflicting  signals,  prob¬ 
ably  coming  from  sites  affected  by  multi¬ 
ple  substitutions  combined  with  differing 
compositional  biases,  have  profoimd  effects 
on  the  ILD  test  results.  Moreover,  they  sug¬ 
gest  that  improving  the  phylogenetic  recon¬ 
struction  mc^el  by  eliminating  such  sites  will 
improve  the  congruence  between  data  sets. 
Despite  the  professed  incongruence,  simul¬ 
taneous  analysis  did  not  reduce,  and  in  some 
cases  substantially  increased,  support  for  all 
clades.  We  agree  with  the  notion  that  when 
different  partitions  yield  strongly  different 
and  well-supported  relationships,  simulta¬ 
neous  analysis  should  not  be  performed. 
However,  when  the  topological  incongru¬ 
ence  is  concentrated  in  unsupported  clades, 
as  in  the  present  study,  simultaneous  analy¬ 
sis  appears  beneficial.  Apparently,  when  the 
partitions  are  combined,  phylogenetic  sig¬ 
nals  from  separate  partitions  have  additive 
properties,  resulting  in  stronger  support  for 
the  inferred  clades.  Moreover,  different  par¬ 
titions  resolve  different  regions  of  the  tree, 
a  property  discussed  earlier  by  Pennington 
(1996).  Our  analysis  suggests  that  the  con¬ 
gruence,  or  lack  thereof,  between  data  sets 
from  real  taxa  is  a  complex  problem  not  yet 
well  understood. 

Anophelinae  Phytogeny 

Most  of  the  relationships  inferred  with  the 
combined  data  (Fig.  5)  are  well  supported 
and  in  agreement  with  previous  morpholog¬ 
ical  and  molecular  studies.  Because  the  an¬ 
alyzed  loci  are  unlinked  and  independently 
give  congruent  topology  for  better  supported 
clades,  with  the  areas  of  conflict  limited  to  un¬ 
supported  or  poorly  supported  branches,  we 
believe  that  our  phylogenetic  h)q50thesis  is  a 
reliable  estimate  of  Anophelinae  phylogeny, 
with  two  exceptions. 

The  position  of  Bironella  diverging  after 
Stethomyia  is  probably  incorrect;  instead,  we 
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believe  the  white  gene  tree,  with  Bironella  as 
a  sister  taxon  to  Anopheles,  reflects  a  true 
evolutionary  history  that  is  swamped  by  the 
noise  introduced  by  other  genes.  Monophyly 
of  Anopheles  relative  to  Bironella  was  sug¬ 
gested  by  earlier  molecular  studies  based  on 
a  more  lunited  sampling  of  white  (Besansky 
and  Fahey,  1997)  and  COll  (Foley  et  al., 
1998)  genes.  In  contrast,  Sallum  et  al.  (2000) 
stated  that  Anopheles,  as  traditionally  de¬ 
fined,  is  paraphyletic.  According  to  their 
newly  proposed  phylogeny  of  the  subfam¬ 
ily,  one  of  the  lineages  within  the  genus 
Anopheles  would  contain  species  of  Bironella 
as  well  as  the  subgenera  Lophopodomyia  and 
Stethomyia  arising  among  species  of  the  sub¬ 
genus  Anopheles.  Moreover,  An.  pseudopunc- 
tipennis  of  the  subgenus  Anopheles  would  oc¬ 
cupy  a  basal  position  within  this  lineage, 
before  the  divergence  of  Lophopodomyia  and 
well  before  Bironella  and  Stethomyia  diverge. 
The  present  results,  however,  contradict  this 
h^othesis.  In  both  the  white  gene  and  com¬ 
bined  data  tree,  Bironella  is  one  of  the  basal 
branches  of  Anophelinae,  and  aU  species 
of  subgenus  Anopheles,  including  An.  pseu- 
dopunctipennis,  form  a  very  strongly  sup¬ 
ported  monophyletic  clade.  According  to  the 
Kishino-Hasegawa  test,  the  tree  constrained 
to  reflect  the  hypothesis  of  Sallum  et  al.  (2000) 
is  significantly  less  likely  (f  =  3.8099,  P  — 
0.0001)  than  the  tree  shown  on  Fig.  5.  Sim¬ 
ilarly,  we  rejecled  a  close  affinity  of  Bironella 
to  the  subgenus  Anopheles  in  our  previous 
study  (Krzywinski  et  al.,  2001).  Monophyly 
of  the  subgenus  Anopheles  sensu  Sallum  et  al. 
(2000),  that  is,  including  two  other  Anophe¬ 
les  subgenera  and  Bironella,  is  based  on  six 
synapomorphies.  However,  the  authors  indi¬ 
cate  that  those  characters  are  homoplasious 
and  rather  inconsistent  (bootstrap  <50%, 
Bremer  support  =  2).  Low  support  from  mor¬ 
phology  and  strong  contradictory  evidence 
from  molecular  data  indicate  that  the  hy¬ 
pothesis  of  Sallum  et  al.  (2000)  concerning 
conflicting  clades  is  based  on  data  compro¬ 
mised  by  homoplasy.  As  such,  it  is  difficult  to 
argue  that  the  characters  used  in  their  study 
have  enough  resolving  power  for  those  prob¬ 
lematic  relationships  to  serve  as  a  founda¬ 
tion  for  a  substantial  change  of  the  estab¬ 
lished  systematics  of  the  group.  We  conclude 
that  discrepant  hypotheses  reflect  a  different 
interpretation  of  the  results  rather  than  real 
conflict  between  morphology  and  molecular 
data. 


In  fact,  reliable  inference  of  the  relation¬ 
ships  among  Bironella  and  basal  Anopheles 
clades  may  be  problematic.  Lack  of  resolu¬ 
tion  at  these  levels,  characteristic  of  all  phy¬ 
logenetic  studies  to  date,  can  result  from  (1) 
combining  conflicting  signal  if  different  data 
partitions  experienced  different  evolution¬ 
ary  histories  (gene  trees  vs.  species  trees), 
(2)  strongly  unequal  rates  of  evolution,  or  (3) 
nearly  contemporaneous  radiations.  Because 
the  genes  used  in  this  and  our  previous  study 
(Krzywinski  et  al.,  2001),  did  not  yield  ma¬ 
jor  phylogenetic  conflicts,  we  discount  the 
first  explanation.  Separate  analyses  of  the 
G6pd  and  white  data,  whether  partitioned 
(1 )  according  to  ntl  nt2  versus  nt3  positions 
or  (2)  as  hydrophilic  versus  hydrophobic  re¬ 
gions  (data  not  shown),  indicate  that  rate  dif¬ 
ferences,  although  contributing  to  the  prob¬ 
lem,  are  unlikely  culprits.  The  most  probable 
explanation,  which  reconciles  morphological 
and  molecular  results,  is  a  rapid  radiation  of 
Bironella  and  basal  clades  within  Anopheles. 
This  scenario,  recently  suggested  on  the  basis 
of  analysis  of  the  mitochondrial  and  nuclear 
ribosomal  genes  (Krz3rwmski  et  al.,  2001),  is 
also  consistent  with  the  present  results,  in 
which  the  relevant  branches  are  very  short 
(Table  7)  and  poorly  supported. 

The  position  of  the  subgenus  Lophopo¬ 
domyia,  represented  by  a  single  species  An. 
scjuamifetnur,  remains  unsupported  and  un¬ 
certain.  First,  character  sampling  for  An. 
squamifemur  was  sparse,  with  only  D2  and 
white  sequences  being  obtained.  Second, 
Lophopodomyia  might  have  arisen  in  the  pro¬ 
cess  of  a  rapid  radiation  of  basal  Anophe¬ 
les  clades,  and  more  characters  may  not  be 
a  remedy  for  the  lack  of  support.  The  de¬ 
creased  bootstrap  values  for  the  lineages  of 
subgenus  Anopheles  -f  Cellia  and  the  genus 
Anopheles  excluding  Stethomyia  in  the  com¬ 
bined  analysis  of  the  extended  data  set  ver¬ 
sus  the  21-species  data  set  (Fig.  5)  suggests 
that  An.  squamifemur  is  indeed  a  problem¬ 
atic  taxon.  Third,  some  weak  conflict  over  the 
position  of  Lophopodomyia  seems  to  exist  be¬ 
tween  the  data  sets.  In  contrast  to  the  white 
data  tree,  the  D2  data  tree  placed  An.  squam¬ 
ifemur  as  the  most  basal  clade  of  Anopheli¬ 
nae  (data  not  shown).  However,  considera¬ 
tion  of  this  apparent  conflict  should  be  tem¬ 
pered  by  the  possibility  that  the  position  of 
Lophopodomyia  on  the  D2  gene  tree  may  re¬ 
sult  from  misalignment  of  some  fragments 
of  An.  squamifemur  sequence  relative  to  other 
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Table  7.  Results  of  the  interior-branch  test  based  on  selected  four-duster  trees  with  the  topology  ((A,B)XCD)). 
Ch,  =  Chagasia,  Bi  =  Bironella,  Ano  =  subgenus  Anopheles,  Cel  =  Cellia,  Ker  =  Kerteszia,  Nys  =  Nyssorhynchus, 
Sth  =  Stethomyia.  Although  addition  more  data  increases  branch  lengths,  some  of  the  branches  connecting  ingroup 
taxa  are  still  not  significantly  different  from  zero,  consistent  with  a  "star  phytogeny." 


Data  set 

A 

B 

C 

D 

CP“ 

whit^ 

Ch 

Bi 

Ker 

Nys 

0.949 

Ch 

Bi 

Lph 

Ker  -1-  Nys 

0.941 

Bi 

Sth 

Ker 

Nys 

0.668 

all  21' 

Ch 

Sth 

Ano  -I-  Cel 

Ker  -(-  Nys 

0.907 

Ano 

Cel 

Sth 

Ker  +  Nys 

0.774 

Bir 
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“Complement  of  the  probability  (1  —  a)  that  interior  branch  in  the  tree  is  significantly  different  from  zero. 
^white  gene  with  gaps  and  all  missing  sites  removed. 

'The  21-species  data  set  of  all  four  genes  combined. 

'^Asterisk  denotes  significant  values. 


aligned  D2  sequences.  For  a  discussion  of  the 
D2  alignment,  see  Krz5fwinski  et  al.  (2001). 

Despite  topological  incongruence  over 
the  position  of  Bironella  and  subgenera 
Lophopodomyia  and  Stethomyia  inferred  in 
the  present  study  and  in  Sallum  et  al. 
(2000),  other  relationships  are  fully  con¬ 
gruent;  monophyly  of  Anophelinae;  basal 
position  of  Chagasia;  monophyly  of  the 
Anopheles  subgenera  Cellia,  Kerteszia,  and 
Nyssorhynchus;  sister  taxon  relationship  be¬ 
tween  Nyssorhynchus  and  Kerteszia;  and 
monophyly  of  the  Arribalzagia  Series  within 
the  subgenus  Anopheles.  Present  results  of  the 
combined  analysis  are  congruent  with  the 
hypothesis  presented  in  our  previous  study 
of  ND5  and  D2  genes  (Krzywinski  et  al., 
2001)  and  some  of  these  relationships  are 
also  suggested  in  other  studies.  Monophyly 
of  Anophelinae  and  the  ancient  origin  of 
Chagasia  are  congruent  with  the  traditional 
notion  (Ross,  1951)  and  previous  phyloge¬ 
netic  analysis  of  morphological  characters 
(Harbach  and  Kitching,  1998).  The  close  rela¬ 
tionship  of  Anopheles  suhgenera  Kerteszia  and 
Nyssorhynchus  was  suggested  by  Root  (1922) 
and  Zavortink  (1973).  Edwards  (1932),  fol¬ 
lowing  Christophers  (1924),  treated  Kerteszia 
as  a  species  group  within  Nyssorhynchus,  but 
Peyton  et  al.  (1992)  demonstrated  the  dis¬ 
tinctness  of  both  taxa.  Finally,  monophyly  of 
the  Arribalzagia  Series  was  hypothesized  by 
Wilkerson  and  Peyton  (1990). 

The  clade  of  the  subgenera  Anopheles  + 
Cellia  probably  reflects  a  historical  relation¬ 
ship.  Its  monophyly,  relatively  well  sup¬ 
ported  by  ML  analysis  of  the  21-species 
data  set  (78%)  and  D2  +  ND5  genes  (70%; 
Krzywinski  et  al.,  2001),  was  also  recovered 


by  the  weighted  MP  analyses  of  the  white 
gene.  Moreover,  very  high  parametric  boot¬ 
strap  support  (100%)  from  the  simulated 
combined  data  indicates  that  this  grouping 
is  unlikely  to  be  the  result  of  cumulative  ran¬ 
dom  error.  Foley  et  al.  (1998)  argued  that 
the  subgenera  Anopheles  and  Cellia  are  para- 
phyletic  with  regard  to  each  other.  In  con¬ 
trast,  the  present  analysis  suggests  that  both 
are  monophyletic.  Different  taxa  used  in  both 
studies  prevent  us  from  testing  the  hypoth¬ 
esis  of  Foley  et  al.  (1998),  but  very  poor 
support  of  lineages  at  these  levels  suggests 
that  the  phylogenetic  inference  in  their  COII 
study  was  strongly  influenced  by  homo- 
plasy.  N  evertheless,  monophyly  of  some  sub¬ 
genera,  as  delineated  now,  conceivably  may 
not  stand  when  denser  taxon  sampling  is 
performed. 

Biogeography 

Strongly  supported  relationships  derived 
in  the  present  analysis,  congruent  with 
other  studies,  provide  an  opportunity  for 
interpreting  the  results  in  a  biogeographic 
framework  with  relative  confidence.  Be¬ 
cause  Anophelinae  and  mosquitoes  in 
general  have  received  little  phylogenetic 
attention  (Munstermann  and  Conn,  1997), 
almost  nothing  is  known  about  their  origins 
and  historical  biogeography.  Belkin  (1962) 
speculated  that  "the  initial  differentiation 
of  the  subfamily  took  place  in  the  Ameri¬ 
can  Mediterranean  Region".  Harbach  and 
Kitching  (1998)  followed  Belkin  (1962) 
and  pointed  at  the  New  World  as  a  pos¬ 
sible  center  of  origin  of  the  subfamily, 
because  Neotropical  Chagasia  took  a  basal 
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position  within  Anophelinae  in  their 
analysis.  Our  results  also  support  Belkin's 
speculation  and  additionally  strengthen  it, 
based  on  the  reasoning  of  Bremer  (1992), 
who  developed  a  procedure  for  estimat¬ 
ing  ancestral  areas  of  individual  groups 
from  topological  information  in  their  area 
cladograms.  His  method  is  based  on  the 
assumption  that  areas  positionally  ple- 
siomorphic  (basal)  and  frequent  in  the  area 
cladogram  are  more  likely  to  be  parts  of 
the  ancestral  range  than  are  positionally 
apomorphic  (placed  at  the  top  of  the  clado¬ 
gram)  and  rare  areas.  According  to  Bremer's 
(1992)  rationale,  basal  placement  of  Chagasia 
relative  to  other  anopheUnes  and  the  data 
on  the  Neotropical  distribution  of  this  genus 
and  of  four  out  of  six  subgenera  of  Anopheles 
suggest  that  South  America  was  the  center 
of  origin  of  the  subfamily  (Fig.  6). 

The  inference  of  a  South  American  ori¬ 
gin  of  Anophelinae  and  the  monophyly  of 
subgenera  Anopheles  +  Cellia  and  their  de¬ 
rived  position  have  important  biogeographic 
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Figure  6.  Estimation  of  ancestral  area  tor  Anopheli¬ 
nae  based  on  the  method  of  Bremer  (1992).  G  =  number 
of  necessary  gains  under  forward  Camin-Sokal  parsi¬ 
mony;  L  =  number  of  necessary  losses  under  reverse 
Camin-Sokal  parsimony;  AA  =  G/L  quotients  rescaled 
to  a  maximum  value  of  1  by  dividing  with  the  largest 
G/L  value. 


implications.  The  evidence  from  plate 
tectonics  as  well  as  fossils  and  the  distri¬ 
bution  of  other  organisms  (Cox  and  Moore, 
1993;  Pitman  et  al.,  1993)  allows  us  to  pro¬ 
pose  the  following  hypothesis  of  Anophe¬ 
linae  history.  Cosmopolitan  distribution  of 
the  subgenus  Anopheles  should  not  be  re¬ 
garded  as  an  ancestral  state  but  rather  as 
a  result  of  relatively  recent  dispersal.  This 
proposal  predicts  that  the  lineage  of  Anophe¬ 
les  +  Cellia,  in  which  most  extant  clades  are 
found  in  the  Old  World,  originated  before  the 
breakup  of  the  western  Gondwanaland  in  the 
Late  Cretaceous.  The  first  branching  events 
within  the  subgenus  Anopheles  would  have 
taken  place  before  the  loss  of  the  land  connec¬ 
tion  between  Africa  and  South  America,  ~95 
Mya.  Breakup  of  the  continents,  leading  to  ef¬ 
fective  isolation  of  faunas,  may  have  resulted 
in  segregating  the  Neotropical  Pseudopunc- 
tipennis  Group,  the  basal  lineage  of  the  sub¬ 
genus  Anopheles,  from  the  other  stem  lineages 
in  Africa.  Creation  of  the  land  bridges  be¬ 
tween  Africa  and  Europe  in  the  Paleocene 
and  the  cormection  from  Eimope  to  North 
America  via  Greenland,  which  existed  imtil 
the  end  of  the  Eocene,  allowed  further  disper¬ 
sal  of  the  subgenus  to  Eurasia  and  the  Nearc- 
tic.  The  corridor  for  migration  between  North 
and  South  America,  consisting  of  Aves  Ridge 
and  the  Greater  Antilles,  which  probably  ex¬ 
isted  until  the  Late  Eocene,  ~49  Mya,  could 
provide  a  route  for  certain  lineages  of  the 
subgenus  Anopheles  (Cycloleppteron  and  Ar- 
ribalzagia  Series)  to  reenter  ^uth  America 
from  the  north.  Alternatively,  dispersal  from 
the  north  could  have  taken  place  in  the  mid- 
Miocene  (~15  Mya)  via  the  Panama  island 
arc.  Eastward  movement  of  the  Caribbean 
plate  in  the  Late  Eocene  and  sea  level  fluc¬ 
tuations  might  cause  some  migrating  species 
to  reach  the  islands  only  to  be  cut  off  amd  be¬ 
come  essentially  isolated  (for  example.  An. 
grabhami  of  the  Cycloleppteron  Series).  This 
scenario  is  fully  congruent  with  the  branch¬ 
ing  order  in  the  trees  inferred  in  the  present 
study  and  largely  consistent  with  the  hy¬ 
pothesis  of  Sallum  et  al.  (2000)  regarding 
the  basal  position  of  the  Pseudopunctipen- 
nis  Group,  the  intermediate  position  of  Old 
World  and  Nearctic  taxa,  and  the  position 
of  the  Cycloleppteron  and  Arribalzagia  Se¬ 
ries  among  the  most  derived  lineages.  The 
absence  of  Cellia  in  the  New  World  sug¬ 
gests  that  the  radiation  of  this  subgenus 
nught  not  have  been  triggered  imtil  the 
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Late  Eocene,  when  the  connection  between 
Europe  and  North  America  was  lost.  The  as¬ 
sume  basal  position  of  Bironella  relative  to 
Anopheles  clades  implies  that  the  ancestors 
of  this  lineage  migrated  to  the  landmass  of 
Australia  well  before  Australia  and  Antarc¬ 
tica  separated  from  South  America.  The 
timing  of  separation  is  imcertain,  but  they 
seem  to  have  parted  some  time  in  the  Early 
Cenozoic.  According  to  this  sequence  of 
events,  Australasian  Anop/ie/cs  faima  have  an 
Eurasian  origin.  This  accords  with  Belkin's 
view  (1962)  but  contrasts  with  the  opinion  of 
Foley  et  al.  (1998),  who  hinted  at  a  two-way 
exchange  between  Australasian  and  Oriental 
anopheline  fauna  rather  than  immigration. 

Our  hypothesis  suggests  that  the  molec¬ 
ular  clock,  proposed  by  Foley  et  al.  (1998), 
predicting  divergence  of  the  lineages  lead¬ 
ing  to  D.  yakuba  and  Aedes/Culex  at  106-46 
Mya,  is  seriously  underestimated.  This  is  not 
unexpected  for  a  clock  based  on  a  highly  sat¬ 
urated  coil  gene.  The  present  data  support 
the  opinion  of  Hennig  (1981)  that  the  super¬ 
family  Culicoidea  might  have  existed  in  the 
Upper  Triassic  (215  Mya).  It  is  also  consistent 
wilh  Edwards  (1932),  who  suggested  an  an¬ 
cient  origin  of  mosquitoes  and  their  existence 
by  the  Jurassic.  A  very  limited  and  relatively 
young  fossil  mosquito  record  contributes  lit¬ 
tle  to  our  understanding  of  the  early  evolu¬ 
tion  of  the  group.  The  earliest  known  fossils, 
from  the  Late  Eocene,  indicate  that  the  main 
lineages  were  already  well  differentiated  by 
about  38  Mya  (Poinar,  1992),  which  are  con¬ 
cordant  wi^  the  notion  of  a  long  history  of 
mosquitoes. 

The  hjqjotheses  presented  above  are  con¬ 
gruent  with  all  the  available  phylogenetic 
and  biogeographic  evidence,  fecause  they 
are  based  on  analyses  of  relatively  small  sam¬ 
ples  of  taxa,  further  studies  of  Anophelinae 
with  extended  sampling  are  needed  to  test 
them.  Careful  sampling  of  representatives 
of  subgenera  Cellia  and  Anopheles  is  prob¬ 
ably  the  key  to  a  better  understanding  of 
the  biogeographic  patterns  within  the  genus 
Anophdes. 
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